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ABSTRACT 


The  Cleanroom  software  development  approach  Is  Intended  to  produce  highly  reli¬ 
able  software  by  Integrating  formal  methods  for  specification  and  design,  complete  off¬ 
line  development,  and  statistically  based  testing.  In  an  empirical  study,  15  three-person 
teams  developed  versions  of  the  same  software  system  (800  -  2300  source  lines);  ten 
teams  applied  Cleanroom,  while  five  applied  a  more  traditional  approach.  This  analysis 
characterizes  the  effect  of  Cleanroom  on  the  delivered  product,  the  software  develop¬ 
ment  process,  and  the  developers.  The  major  results  of  this  study  are  1)  most  develop¬ 
ers  were  able  to  apply  the  techniques  of  Cleanroom  effectively;  2)  the  Cleanroom  teams' 
products  met  system  requirements  more  completely  and  had  a  higher  percentage  of  suc¬ 
cessful  test  cases;  3)  the  source  code  developed  using  Cleanroom  had  more  comments 
and  less  dense  complexity;  4)  the  use  of  Cleanroom  successfully  modified  aspects  of 
development  style;  and  5)  most  Cleanroom  developers  indicated  they  would  use  the 
approach  again. 
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1.  Introduction 


The  need  for  discipline  In  the  software  development  process  and  for  high  quality 
software  motivates  the  Cleanroom  software  development  approach.  In  addition  to 
Improving  the  control  during  development,  this  approach  Is  Intended  to  deliver  a  pro¬ 
duct  that  meets  several  quality  aspects:  a  system  that  conforms  with  the  requirements,  a 
system  with  high  operational  reliability,  and  source  code  that  Is  easily  readable  and 
modifiable. 

Section  II  describes  the  Cleanroom  approach  and  a  framework  of  goals  for  charac¬ 
terizing  Its  effect.  Section  III  presents  an  empirical  study  using  the  approach.  Section 
IV  gives  the  results  of  the  analysis  comparing  projects  developed  using  Cleanroom  with 
those  of  a  control  group.  The  overall  conclusions  appear  In  Section  V. 

2.  Cleanroom  Software  Development 

The  Federal  Systems  Division  of  IBM  [Dyer  82,  Dyer  &  Mills  82]  presents  the 
Cleanroom  software  development  method  as  a  technical  and  organizational  approach  to 
developing  software  with  certifiable  reliability.  The  Idea  Is  to  deny  the  entry  of  defects 
during  the  development  of  software,  hence  the  term  "Cleanroom."  The  focus  of  the 
method  is  Imposing  discipline  on  the  development  process  by  Integrating  formal  methods 
for  specification  and  design,  complete  off-line  development,  and  statistically  based  test¬ 
ing.  These  components  are  Intended  to  contribute  to  a  software  product  that  has  a  high 
probability  of  zero  defects  and  consequently  a  high  measure  of  operational  reliability. 

The  mathematically-based  design  methodology  of  Cleanroom  Includes  the  use  of 
structured  specifications  and  state  machine  models  [Ferrentlno  &  Mills  77].  A  systems 
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engineer  Introduces  the  structured  specifications  to  restate  the  system  requirements  pre¬ 
cisely  and  organize  the  complex  problems  Into  manageable  parts  [Parnas  72].  The 
specifications  determine  the  "system  architecture"  of  the  Interconnections  and  groupings 
of  capabilities  to  which  state  machine  design  practices  can  be  applied.  System  Imple¬ 
mentation  and  test  data  formulation  can  then  proceed  from  the  structured  specifications 
Independently. 

The  rlght-the-flrst-tlme  programming  methods  used  In  Cleanroom  are  the  Ideas  of 
functionally  based  programming  In  [Mills  72b,  Linger,  Mills  Sc  Witt  79].  The  testing 
process  Is  completely  separated  from  the  development  process  by  not  allowing  the 
developers  to  test  and  debug  their  programs.  The  developers  focus  on  the  techniques  of 
code  Inspections  [Fagan  76],  group  walkthroughs  [Myers  78],  and  formal  verification 
[Hoare  69,  Linger,  Mills  Sc  Witt  79,  Shankar  82,  Dyer  83]  to  assert  the  correctness  of 
their  Implementation.  These  constructive  techniques  apply  throughout  all  phases  of 
development,  and  condense  the  activities  of  defect  detection  and  Isolation  Into  one 
operation.  This  discipline  Is  Imposed  with  the  Intention  that  correctness  Is  "designed” 
Into  the  software,  not  "tested"  In.  The  notion  that  “Well,  the  software  should  always 
be  tested  to  find  the  faults"  Is  eliminated. 

In  the  statistically  based  testing  strategy  of  Cleanroom,  Independent  testers  simu¬ 
late  the  operational  environment  of  the  system  with  random  testing.  This  testing  pro¬ 
cess  Includes  defining  the  frequency  distribution  of  Inputs  to  the  system,  the  frequency 
distribution  of  different  system  states,  and  the  expanding  hierarchy  of  developed  system 
capabilities.  Test  cases  then  are  chosen  randomly  and  presented  to  the  series  of  product 
releases,  while  concentrating  on  functions  most  recently  delivered  and  maintaining  the 
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overall  composite  distribution  of  Inputs.  The  Independent  testers  then  record  observed 
failures  and  determine  an  objective  measure  of  product  reliability.  It  Is  believed  that 
the  prior  knowledge  that  a  system  will  be  evaluated  by  random  testing  will  affect  system 
reliability  by  enforcing  a  new  discipline  Into  the  system  developers. 

2.1.  Investigation  Goals 

Some  Intriguing  aspects  of  the  Cleanroom  approach  Include  l)  development  without 
testing  and  debugging  of  programs,  2)  independent  program  testing  for  quality 
assurance  (rather  than  to  find  faults  or  to  prove  "correctness”  [Howden  76]),  and  3) 
certification  of  system  reliability  before  product  delivery.  In  order  to  understand  the 
effects  of  using  Cleanroom,  the  following  three  goals  are  proposed:  1)  characterize  the 
effect  of  Cleanroom  on  the  delivered  product,  2)  characterize  the  effect  of  Cleanroom  on 
the  software  development  process,  and  3)  characterize  the  effect  of  Cleanroom  on  the 
developers.  An  application  of  the  goal/questlon/metrlc  paradigm  [Baslll  &  Selby  84, 
Baslll  &  Weiss  84]  leads  to  the  framework  of  goals  and  questions  for  this  study  appear¬ 
ing  In  Figure  1.  The  empirical  study  executed  to  pursue  these  goals  Is  described  In  the 
following  section. 
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3.  Empirical  Study  Using  Cleanroom 

This  section  describes  an  empirical  study  comparing  team  projects  developed  using 
Cleanroom  with  those  using  a  more  conventional  approach. 
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3.1.  Case  Study  Description 


Subjects  for  the  empirical  study  came  from  the  "Software  Design  and  Develop¬ 
ment"  course  taught  by  F.  T.  Baker  and  V.  R.  Baslll  at  the  University  of  Maryland  In 
•  he  Falls  of  1982  and  1983.  The  Initial  segment  of  the  course  was  devoted  to  the 
presentation  of  several  software  development  methodologies.  Including  top-down  design, 
modular  specification  and  design,  PDL,  chief  programmer  teams,  program  correctness, 
(.ode  reading,  walkthroughs,  and  functional  and  structural  testing  strategies.  For  the 
latter  part  of  the  course,  the  Individuals  were  divided  Into  three-person  chief  program¬ 
mer  teams  for  a  group  project  (Baker  72,  Mills  72a,  Baker  81).  We  attempted  to  divide 
tha  teams  equally  according  to  professional  experience,  academic  performance,  and 
Implementation  language  experience.  The  subjects  had  an  average  of  1.0  years  profes¬ 
sional  experience  and  were  computer  science  majors  with  Junior,  senior,  or  graduate 
standing.  Figure  2  displays  the  distribution  of  the  subjects’  professional  experience. 


Figure  2. 

Subjects'  professional  experience  In  years. 
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A  requirements  document  for  an  electronic  message  system  (read,  send,  mailing 
lists,  authorized  capabilities,  etc.)  was  distributed  to  each  of  the  teams.  The  project  was 
to  be  completed  In  six  weeks  and  was  expected  to  be  about  1200  lines  of  Slmpl-T  source 
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[Baslll  &  Turner  76].  1  The  development  machine  was  a  Unlvac  1100/82  running  EXEC 
Vin,  with  1200  baud  Interactive  and  remote  access  available. 

The  ten  teams  In  the  Fall  1982  course  applied  the  Cleauroom  software  development 
approach,  while  the  five  teams  In  the  Fall  1983  course  serv  ed  as  a  control  group  (non- 
Cleanroom).  All  other  aspects  of  the  developments  were  the  same.  The  two  groups  of 
teams  were  not  statistically  different  In  terms  of  professional  experience,  academic  per¬ 
formance,  cr  implementation  language  experience.  If  there  were  any  bias  between  the 
two  times  the  course  was  taught.  It  would  be  in  favor  of  the  1983  (non-Cleanroom) 
group  because  the  modular  design  portion  of  the  course  was  presented  earlier.  It  was 
also  the  second  time  F.  T.  Baker  had  taught  the  course,  fs.^e  that  the  teams  In  the 
non-Cleanroom  group  applied  a  development  approach  similar  to  the  “disciplined  team” 
approach  examined  In  an  earlier  study  [Baslll  &  Reiter  81], 

The  first  document  every  team  In  either  group  turned  In  contained  a  system 
specification,  composite  design  diagram,  and  Implementation  plan.  The  latter  element 
was  a  series  of  milestones  describing  when  the  various  functions  within  the  system 
would  be  available.  At  these  various  dates  (minimum  one  week  apart,  maximum  two), 
teams  from  both  groups  would  then  submit  their  systems  for  testing.  An  Independent 
party  would  then  apply  statistically  based  testing  to  each  of  these  deliveries  and  report 
to  the  team  members  both  the  successful  and  unsuccessful  test  cases.  The  latter  would 

1  Slmpl-T  Is  a  structured  language  that  supports  several  string  and  file  handling 
primitives.  In  addition  to  the  usual  control  flow  constructs  available,  for  example.  In 
Pascal.  If  Pascal  or  FORTRAN  had  been  chosen.  It  would  have  been  very  likely  that 
some  Individuals  would  have  had  extensive  experience  with  the  language,  and  this  would 
have  biased  the  comparison.  Also,  restricting  access  to  a  compiler  that  produced  execut¬ 
able  code  would  have  been  very  difficult. 
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be  Included  In  the  next  test  session  for  verification.  Recall  that  the  Cleanroom  teams 
could  not  execute  their  programs  -  they  had  editing  and  syntax-checking  capabilities 
only.  They  had  to  rely  on  the  techniques  of  code  reading,  structured  walkthroughs,  and 
Inspections  to  prepare  their  programs  before  submission.  On  the  other  hand,  the  non- 
Cleanroom  teams  had  full  access  to  compilation  and  execution  facilities  to  test  their  sys¬ 
tems  prior  to  Independent  testing. 

All  team  projects  were  evaluated  on  the  use  of  the  development  techniques 
presented  In  class,  the  Independent  testing  results,  and  a  final  oral  Interview.  In  addi¬ 
tion  to  these  sources,  Information  on  the  team  projects  was  collected  from  a  background 
questionnaire,  a  postdevelopment  attitude  survey,  static  source  code  analysis,  and 
operating  system  statistics.  The  following  section  briefly  describes  the  operationally 
based  testing  process  applied  to  all  projects  by  the  Independent  tester. 

3.2.  Operational  Testing  of  Projects 

The  testing  approach  used  In  Cleanroom  Is  to  simulate  the  developing  system’s 
environment  by  randomly  selecting  test  data  from  an  ‘‘operational  profile,”  a  frequency 
distribution  of  inputs  to  the  system  [Thayer,  Llpow  &  Nelson  78,  Duran  &  Ntafos  81]. 
The  projects  from  both  groups  were  tested  Interactively  at  the  milestones  chosen  by 
each  team  by  an  Independent  party  (l.e.,  R.  W.  Selby).  A  distribution  of  Inputs  to  the 
system  was  obtained  by  Identifying  the  logical  functions  In  the  system  and  assigning 
each  a  frequency.  This  frequency  assignment  was  accomplished  by  polling  eleven  well- 
seasoned  users  of  the  University  of  Maryland  Vax  11/780  mailing  system.  Then  test 
lata  were  generated  randomly  from  this  profile  and  presented  to  the  system.  Recording 
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Figure  11  displays  the  replies  of  the  developers  when  they  were  asked  how  their 
design  and  coding  style  was  affected  by  not  being  able  to  test  and  debug.  At  first  It 
would  seem  surprising  that  more  people  did  not  modify'  their  development  style  when 


4.2.1.  Summary  of  the  Effect  on  the  Development  Process 

Summarizing  the  effect  on  the  development  process,  Cleanroom  developers  l)  felt 
they  applied  off-line  review  techniques  more  effectively,  while  non-Cleanroom  teams 
focused  on  functional  testing;  2)  spent  less  time  on-line  and  used  fewer  computer 
resources;  and  3)  made  all  their  scheduled  deliveries. 

4.3.  Characterization  of  the  Effect  on  the  Developers 

The  first  question  posed  In  this  goal  area  Is  whether  the  Individuals  using  Clean- 
room  missed  the  satisfaction  of  executing  their  own  programs.  Figure  9  presents  the 
responses  to  a  question  Included  In  the  postdevelopment  attitude  survey  on  this  Issue. 
As  might  be  expected,  almost  all  the  Individuals  missed  some  aspect  of  program  execu¬ 
tion.  As  might  not  be  expected,  however,  this  missing  of  program  execution  had  no 
relation  to  either  the  product  quality  measures  mentioned  earlier  or  the  teams'  profes¬ 
sional  or  testing  experience.  Also,  missing  program  execution  did  not  Increase  with 
respect  to  program  size  (see  Figure  10). 
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Schedule  slippage  continues  to  be  a  problem  In  software  development.  It  would  be 
Interesting  to  see  whether  the  Cleanroom  teams  demonstrated  any  more  discipline  by 
maintaining  their  original  schedules.  All  of  the  teams  from  both  groups  planned  four 
releases  of  their  evolving  system,  except  for  team  ’G’  which  planned  five.  Recall  that  at 
each  delivery  an  Independent  party  would  operationally  test  the  functions  currently 
available  In  the  system,  according  to  the  team’s  Implementation  plan.  In  Figure  8,  we 
observe  that  all  the  teams  using  Cleanroom  kept  to  their  original  schedules  by  making 
all  planned  deliveries;  only  two  non-Cleanroom  teams  made  all  their  scheduled 
deliveries. 


Figure  8.  Number  of  system  releases. 
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9  Non-Cleanroom  team  'e'  entered  a  substantial  portion  of  Its  system  on  a  remote 
machine,  only  using  the  Unlvac  computer  mainly  for  compilation  and  execution.  (See 
Distinction  Among  Teams.) 
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were  unable  to  rely  on  testing  methods,  they  may  have  (felt  they  had)  applied  the  off¬ 
line  review  techniques  more  effectively. 

Since  the  role  of  the  computer  Is  more  controlled  when  using  Cleanroom,  one  would 
expect  a  difference  In  on-line  activity  between  the  two  groups.  Figure  7  displays  the 
amount  of  connect  time  that  each  of  the  teams  cumulatively  used.  A  comparison  of  the 
cpu-tlme  used  by  the  teams  was  less  statistically  significant  (MW  =  .110).  Neither  of 
these  measures  of  on-line  activity  related  to  how  effectively  a  team  felt  they  had  used 
the  off-line  techniques  when  either  all  teams  or  Just  Cleanroom  teams  were  considered. 
Although  non-Cleanroom  team  'd'  did  a  lot  of  on-line  testing  and  non-Cleanroom  team 
'e'  did  little,  both  teams  performed  poorly  In  the  measures  of  operational  product  qual¬ 
ity  discussed  earlier.  The  operating  system  of  the  development  machine  captured  these 
system  usage  statistics.  Note  that  the  time  the  Independent  party  spent  testing  Is 
Included.  8  These  observations  exhibit  that  Cleanroom  developers  spent  less  time  on-line 
and  used  fewer  computer  resources.  These  results  empirically  support  the  reduced  role 
of  the  computer  In  Cleanroom  development. 

Figure  7.  Connect  time  In  hours  during  project  development.  9 
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3  When  the  time  the  Independent  tester  spent  Is  not  Included,  the  significance  levels 
for  the  non-parametrlc  statistics  do  not  change. 
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Figure  6.  Breakdown  of  responses  to  the  attitude  survey  question,  “Did  you  feel 

that  you  and  your  team  members  effectively  used  off-line  review  techniques  In 
^^^_^estln^vou^£ro2ectT\oi_^esgons^areifro^C|eanr^mteams;2=:L__________ 

14  -  Yes,  they  were  effective  for  testing  all  parts  of  the  program 

5.5  -  We  used  them  but  felt  that  they  were  only  appropriate  for  certain  parts  of  the 

program 

8.5  -  We  used  them  occasionally,  but  they  were  not  really  a  major  contributing  factor 

to  the  development 

0  -  Did  not  really  use  them  at  all _ _ 


feeling  of  effective  use  of 
off-line  review  techniques:  both  groups 
(team  *e'  does  not  appear  because  of  lack  of  response) 
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The  histogram  In  Figure  8  shows  that  the  Cleanroom  developers  felt  they  applied 
the  off-line  review  techniques  more  effectively  than  did  the  non-Cleanroom  teams.  The 
non-Cleanroom  developers  were  asked  to  give  a  relative  breakdown  of  the  amount  of 
time  spent  applying  testing  and  verification  techniques.  Their  aggregate  response  was 
39%  off-line  review,  52%  functional  testing,  and  9%  structural  testing.  From  this 
breakdown,  we  observe  that  the  non-Cleanroom  teams  primarily  relied  on  functional 
testing  to  prepare  their  systems  for  Independent  testing.  Since  the  Cleanroom  teams 

'  There  are  half-responses  because  an  Individual  checked  both  the  second  and  third 
choices.  The  responses  total  to  28,  not  30,  because  two  separate  teams  lost  a  member 
late  In  the  project.  (See  Distinction  Among  Teams). 
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the  teams  (R  =  .58;  slgnlf.  =  .023).  Neither  professional  nor  testing  experience  corre¬ 
lated  with  off-line  review  effectiveness  when  either  all  teams  or  Just  Cleanroom  teams 
were  considered. 
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4.1.4.  Summary  of  the  Effect  on  the  Product  Developed 


In  summary,  Cleanroom  developers  delivered  a  product  that  l)  met  system  require¬ 
ments  more  completely,  2)  had  a  higher  percentage  of  successful  test  cases,  3)  had  more 
comments  and  less  dense  complexity,  and  4)  used  more  global  data  Items  and  a  higher 
percentage  of  assignment  statements.  The  more  successful  Cleanroom  developers  1) 
used  more  procedure  calls  and  If  statements,  2)  used  fewer  case  and  while  statements,  3) 
reused  variables  less  frequently,  4)  developed  subroutines  requiring  less  (software  sci¬ 
ence)  effort  to  comprehend,  and  5)  had  more  general  programming  language  experience. 

4.2.  Characterization  of  the  Effect  on  the  Development  Process 

In  a  postdevelopment  attitude  survey,  the  developers  were  asked  how  effectively 
they  felt  they  applied  off-line  review  techniques  In  testing  their  projects  (see  Figure  9). 
This  was  an  attempt  to  capture  some  of  the  Information  necessary  to  answer  the  first 
question  under  this  goal  (question  II.A).  In  order  to  make  comparisons  at  the  team 
level,  the  responses  from  the  members  of  a  team  are  composed  Into  an  average  for  the 
team.  The  responses  to  the  question  appear  on  a  team  basis  In  a  histogram  In  the 
second  part  of  the  figure.  Of  the  Cleanroom  developers,  teams  ’A,'  ’D,'  'E,’  ’F,'  and  T 
were  the  least  confident  In  their  use  of  the  off-line  review  techniques  and  these  teams 
also  performed  the  worst  In  terms  of  operational  testing  results;  four  of  these  five  teams 
performed  the  worst  in  terms  of  Implementation  completeness.  Off-line  review 
effectiveness  correlated  with  percentage  of  successful  operational  tests  (without  duplicate 
failures)  for  the  Cleanroom  teams  (Spearman  R  =  .74:  slgnlf.  =  .014)  and  for  all  the 
teams  (R  =  .79;  slgnlf.  =  .001);  It  correlated  with  Implementation  completeness  for  all 
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Considering  the  products  from  all  teams,  both  percentage  of  successful  test  cases 
(without  duplicate  failures)  and  Implementation  completeness  had  some  correlation  with 
percentage  of  If  statements  (R  =  .48,  slgnlf.  =  .07,  and  R  =  .45,  slgnlf.  =  .09,  respec¬ 
tively)  and  some  negative  correlation  with  percentage  of  case  statements  (R  =  -.48,  slg¬ 
nlf.  =  .07,  and  R  =  -.42,  slgnlf.  =  .12,  respectively).  Neither  of  the  operational  pro¬ 
duct  quality  measures  correlated  with  percentage  of  assignment  statements  when  either 
all  products  or  Just  Cleanroom  products  were  considered.  These  observations  suggest 
that  the  more  successful  Cleanroom  developers  simplified  their  use  of  the  Implementa¬ 
tion  language;  l.e.,  they  used  more  procedure  calls  and  If  statements,  used  fewer  case 
and  while  statements,  had  a  lower  frequency  of  variable  reuse,  and  wrote  subroutines 
requiring  less  software  science  effort  to  comprehend. 

4.1.3.  Contribution  of  Programmer  Background 

When  examining  the  contribution  of  the  Cleanroom  programmers'  background  to 
the  quality  of  their  final  products,  general  programming  language  experience  correlated 
with  percentage  of  successful  operational  tests  (without  duplicate  failures:  Spearman  R 
=  .66,  slgnlf.  =  .04;  with  duplicates:  R  =  .70,  slgnlf.  =  .03)  and  with  Implementation 
completeness  (R  =  .55;  slgnlf.  =  .10).  No  relationship  appears  between  either  opera¬ 
tional  testing  results  or  Implementation  completeness  and  either  professional6  or  testing 
experience.  These  background/quallty  relations  seem  consistent  with  other  studies 
[Curtis  S3]. 

6  In  fact,  there  are  very  slight  negative  correlations  between  years  of  professional  ex¬ 
perience  and  both  percentage  of  successful  tests  (without  duplicate  failures:  R  =  -.40. 
slgnlf.  =  .18)  and  Implementation  completeness  (R  =  -  .47,  slgnlf.  =  .17). 


lower  complexity  density  (MW  =  .079)  than  did  those  using  the  traditional  approach. 
A  calculation  of  either  software  science  effort  [Halstead  77],  cyclomatlc  complexity 
[McCabe  78],  or  syntactic  complexity  without  any  size  normalization,  however,  produced 
no  significant  differences  (MW  >  .10).  This  seems  as  expected  because  all  the  systems 
were  built  to  meet  the  same  requirements. 

Comparing  the  data  usage  In  the  systems,  Cleanroom  developers  used  a  greater 
number  of  global  data  Items  (MW  =  .071).  Also,  Cleanroom  projects  possessed  a  higher 
percentage  of  assignment  statements  (MW  =  .056).  These  last  two  observations  could 
be  a  manifestation  of  teaching  the  Cleanroom  subjects  modular  design  later  In  the 
course  (see  Case  Study  Description),  or  possibly  an  Indication  of  using  the  approach. 

Some  Interesting  observations  surface  when  the  operational  quality  measures  of  the 
Cleanroom  products  are  correlated  with  the  usage  of  the  Implementation  language. 
Both  percentage  of  successful  test  cases  (without  duplicate  failures)  and  Implementation 
completeness  correlated  with  percentage  of  procedure  calls  (Spearman  R  =  .65,  slgnlf. 
=  .044,  and  R  —  .57,  slgnlf.  =  .08,  respectively)  and  with  percentage  of  If  statements 
(R  =  .62,  slgnlf.  =  .058,  and  R  =  .55,  slgnlf.  =  .10,  respectively).  However,  both  of 
these  two  product  quality  measures  correlated  negatively  with  percentage  of  case  state¬ 
ments  (R  =  -.86,  slgnlf.  =  .001,  and  R  =  -.69,  slgnlf.  =  .027,  respectively)  and  with 
percentage  of  while  statements  (R  =  -.65,  slgnlf.  =  .0-44,  and  R  =  -.49,  slgnlf.  =  .15, 
respectively).  There  were  also  some  negative  correlations  between  the  product  quality 
measures  and  the  average  software  science  effort  per  subroutine  (R  =  -.52.  slgnlf.  = 
.12,  and  R  =  -.74,  slgnlf.  —  .013,  respectively)  and  the  average  number  of  occurrences 
of  a  variable  (R  =  -.54,  slgnlf.  =  .11,  and  R  =  -.56,  slgnlf.  =  .09,  respectively). 
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operational  testing  by  Independent  testers.  Since  both  groups  of  teams  had  Independent 
testing  of  all  their  deliveries,  the  early  testing  of  deliveries  must  have  revealed  most 
faults  overlooked  by  the  Cleanroom  developers. 

These  comparisons  suggest  that  the  non-Cleanroom  developers  focused  on  a  “per¬ 
spective  of  the  tester,"  sometimes  leaving  out  classes  of  functions  and  causing  a  less 
completely  Implemented  product  and  more  (especially  unique)  failures.  Off-line  review 
techniques,  however,  are  more  general  and  their  use  contributed  to  more  complete 
requirement  conformance  and  fewer  failures  In  the  Cleanroom  products.  In  addition  to 
examining  the  operational  properties  of  the  product,  various  static  properties  were  com¬ 
pared. 


4.1.2.  Static  System  Properties 

The  first  question  In  this  goal  area  concerns  the  size  of  the  final  systems.  Figure  3 
showed  the  number  of  source  lines,  executable  statements,  and  procedures  and  functions 
for  the  various  systems.  The  projects  from  the  two  groups  were  not  statistically 
different  (\IW  >  .10)  In  any  of  these  three  size  attributes.  Another  question  In  this  goal 
area  concerns  the  readability  of  the  delivered  source  code.  Two  aspects  of  reading  and 
modifying  code  are  the  number  of  comments  present  and  the  density  of  the  "complex¬ 
ity."  In  an  attempt  to  capture  the  complexity  density,  syntactic  complexity  [Baslll  & 
Hutchens  83]  was  calculated  and  normalized  by  the  number  of  executable  statements. 
In  addition  to  control  complexity,  the  syntactic  complexity  metric  considers  nesting 
depth  and  prime  program  decomposition  [Linger,  Mills  &  Witt  79].  The  developers 


using  Cleanroom  wrote  code  that  was  more  highly  commented  (MW  =  .089)  and  had  a 


failures,  even  though  they  did  better  overall.  This  demonstrates  that  while  reviewing 


the  code,  the  Cleanroom  developers  focused  less  than  the  other  groups  on  certain  parts 
of  the  system.  The  more  uniform  review  of  the  whole  system  makes  the  performance  of 
the  system  less  sensitive  to  Its  operational  profile.  Note  that  operational  environments 
of  systems  are  usually  difficult  to  define  a  priori  and  are  subject  to  change. 

Figure  5.  Percentage  of  successful  test  cases  during  operational  testing  (without 
duplicate  failures). 
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In  both  of  the  product  quality  measures  of  Implementation  completeness  and  opera¬ 
tional  testing  results,  there  was  quite  a  variation  In  performance.* * * * 5  A  wide  variation  may 
have  been  expected  with  an  unfamiliar  development  technique,  but  the  developers  using* 
a  more  traditional  approach  had  a  wider  range  of  performance  than  did  those  using 
Cleanroom  in  both  of  the  measures  (even  with  twice  as  many  Cleanroom  teams).  All  of 
the  above  differences  are  magnified  by  recalling  that  the  non-Cleanroom  teams  did  not 
develop  their  systems  In  one  monolithic  step,  they  (also)  had  the  benefit  of  periodic 


5  An  alternate  perspective  Includes  only  the  more  successful  projects  from  each 
group  In  the  comparison  of  operational  product  quality.  When  the  best  Q0%  from  each 
approach  are  examined  (l.e.,  removing  teams  'd,’  ’e.’  A,’  'E,'  ’F,'  and  T),  the  Mann- 
Whltney  significance  level  for  comparing  implementation  completeness  becomes  .0*15  and 
the  significance  level  for  comparing  successful  test  cases  (without  duplicate  failures)  be¬ 
comes  .03*1.  Thus,  comparing  the  best  teams  from  each  approach  Increases  the  evidence 
In  favor  of  Cleanroom  In  both  of  these  product  quality  measures. 


Flgur^^^Regulrementcontormance^^the^stem^ 


J  D 

I  FE  A 

BGCH 

de  b 

c  a 

o  - 

►-* 

a>  - 

I  —  —p 

32 

1  1 

22  %  56  % 

1  1 

91  %  100  % 

Mann- Whitney  2  slgnlf.  =  .088 

To  compare  testing  results  among  the  systems  developed  In  the  two  groups,  fifty 
random  user-session  test  cases  were  executed  on  the  final  release  of  each  system  to  simu¬ 
late  Its  operational  environment.  If  the  final  release  of  a  system  performed  to  expecta¬ 
tions  on  a  test  case,  the  outcome  was  called  a  “success;"  If  not,  the  outcome  was  a 
“failure.”  If  the  outcome  was  a  "failure"  but  the  same  failure  was  observed  on  an  earlier 
test  case  run  on  the  final  release,  the  outcome  was  termed  a  "duplicate  failure."  Figure 
5  shows  the  percentage  of  successful  test  cases  when  duplicate  failures  are  not  Included. 
The  figure  displays  that  Cleanroom  projects  had  a  higher  percentage  of  successful  test 
cases  at  system  delivery.  3  When  duplicate  failures  are  Included,  however,  the  better 
performance  of  the  Cleanroom  systems  Is  not  nearly  as  significant  (MW  ==  .134).  4  This 
Is  caused  by  the  Cleanroom  projects  having  a  relatively  higher  proportion  of  duplicate 

2  The  significance  levels  for  the  Mann- Whitney  statistics  reported  are  the  probabili¬ 
ty  of  Type  I  error  In  an  one-talled  test. 

3  Although  not  considered  here,  various  software  reliability  models  have  been  pro¬ 
posed  to  forecast  system  reliability  based  on  failure  data  [Musa  75.  Currlt  83,  Goel  83]. 

4  To  be  more  succinct,  MW  will  sometimes  be  used  to  abbreviate  the  significance 
level  of  the  Mann- Whitney  statistic. 
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4.1.  Characterization  of  the  Effect  on  the  Product  Developed 


This  section  characterizes  the  differences  between  the  products  delivered  by  both  of 
the  development  groups.  Initially  we  examine  some  operational  properties  of  the  pro¬ 
ducts,  followed  by  a  comparison  of  some  of  their  static  properties. 

4.1.1.  Operational  System  Properties 

In  order  to  contrast  the  operational  properties  of  the  systems  delivered  by  the  two 
groups,  both  completeness  of  Implementation  and  operational  testing  results  were  exam¬ 
ined.  A  measure  of  Implementation  completeness  was  calculated  by  partitioning  the 
required  system  Into  sixteen  logical  functions  (e.g.,  send  mall  to  an  Individual,  read  a 
piece  of  mall,  respond,  add  yourself  to  a  mailing  list,  ...).  Each  function  In  an  Imple¬ 
mentation  was  then  assigned  a  value  of  two  If  It  completely  met  Its  requirements,  a 
value  of  one  If  It  partially  met  them,  or  zero  If  It  was  Inoperable.  The  total  for  each 
system  was  calculated;  a  maximum  score  of  32  was  possible.  Figure  4  displays  this  sub¬ 
jective  measure  of  requirement  conformance  for  the  systems.  Note  that  In  all  figures 
presented,  the  ten  teams  using  Cleanroom  are  In  upper  case  and  the  five  teams  using  a 
more  conventional  approach  are  In  lower  case.  A  first  observation  Is  that  six  of  the  ten 
Cleanroom  teams  built  very  close  to  the  entire  system.  While  not  all  of  the  Cleanroom 
teams  performed  equally  well,  a  majority  of  them  applied  the  approach  effectively 
enough  to  develop  nearly  the  whole  product.  More  Importantly,  the  Cleanroom  teams 
met  the  requirements  of  the  system  more  completely  than  did  the  non-Cleanroom  teams. 


of  failure  severity  and  times  between  failure  took  place  during  the  testing  process.  The 


operational  statistics  referred  to  later  were  calculated  from  fifty  user-session  test  cases 
run  on  the  final  system  release  of  each  team.  For  a  complete  explanation  of  the  opera¬ 
tionally  based  testing  process  applied  to  the  projects.  Including  test  data  selection,  test¬ 
ing  procedure,  and  failure  observation,  see  [Selby  84]. 


4.  Data  Analysis  and  Interpretation 

The  analysis  and  Interpretation  of  the  data  collected  from  the  study  appear  In  the 
following  sections,  organized  by  the  goal  areas  outlined  earlier.  In  order  to  address  the 
various  questions  posed  under  each  of  the  goals,  some  raw  data  usually  will  be  presented 
and  then  Interpreted.  Figure  3  presents  the  number  of  source  lines,  executable  state¬ 
ments,  and  procedures  and  functions  to  give  a  rough  view  of  the  systems  developed. 


Figure  3. 
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applying  the  techniques  of  Cleanroom.  Several  persons  mentioned,  however,  that  they 
already  utilized  some  of  the  Ideas  In  Cleanroom.  Keeping  a  simple  design  supports  rea¬ 
dability  of  the  product  and  facilitates  the  processes  of  modification  and  verification. 
Although  some  of  the  objective  product  measures  presented  earlier  showed  differences  In 
development  style,  these  subjective  ones  are  Interesting  and  lend  Insight  Into  actual  pro¬ 
grammer  behavior. 


One  Indicator  of  the  Impression  that  something  new  leaves  on  people  Is  whether 
they  would  do  It  again.  Figure  12  presents  the  responses  of  the  Individuals  when  they 
were  asked  whether  they  would  choose  to  use  Cleanroom  as  either  a  software  develop¬ 
ment  manager  or  as  a  programmer.  Even  though  these  responses  were  gathered 
(Immediately)  after  course  completion,  subjects  desiring  to  “please  the  Instructor"  may 
have  responded  favorably  to  this  type  of  question  regardless  of  their  true  feelings.  Prac¬ 
tically  everyone  Indicated  a  willingness  to  apply  the  approach  again.  It  Is  Interesting  to 
note  that  a  greater  number  of  persons  In  a  managerial  role  would  choose  to  always  use 
It.  Of  the  persons  that  ranked  the  reuse  of  Cleanroom  fairly  low  in  each  category,  four 


of  the  five  were  the  same  people.  Of  the  six  people  that  ranked  reuse  low,  four  were 


from  less  successful  projects  (one  from  team  ’A’,  one  from  team  'E’  and  two  from  team 
T),  but  the  other  two  came  from  reasonably  successful  developments  (one  from  team  'C‘ 
and  one  from  team  ‘J’).  The  particular  Individuals  on  teams  'E,’  ’I,’  and  ’J‘  rated  the 
reuse  fairly  low  in  both  categories. 


Figure  12. 

Breakdown  of  responses  to  the  attitude  survey  question,  ‘‘Would  you  use 
Cleanroom  again?".  (One  person  did  not  respond  to  this  question.) 

As  a  software  development  manager? 

8  -  Yes,  at  all  times 
14  -  Yes,  but  only  for  certain  projects 

_ 5  -  Not  at  all _ 

As  a  programmer? 

4  -  Yes,  for  all  projects 

18  -  Yes,  but  not  all  the  time 

5  -  Only  If  1  had  to 

0  -  I  would  leave  If  I  had  to 


4.3.1.  Summary  of  the  Effect  on  the  Developers 

In  summary  of  the  effect  on  the  developers,  most  Cleanroom  developers  1)  modified 
In  part  their  development  style,  2)  missed  program  execution,  and  3)  Indicated  they 
would  use  the  approach  again. 

4.4.  Distinction  Among  Teams 

In  spite  of  efforts  to  balance  the  teams  according  to  various  factors  (see  Case  Study 
Description),  a  few  differences  among  the  teams  were  apparent.  Two  separate  Clean¬ 
room  teams,  ’H’  and  ‘I.’  each  lost  a  member  late  In  the  project.  Thus  at  project  comple¬ 
tion.  there  were  eight  three-person  and  two  two-person  Cleanroom  teams.  Recall  that 
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team  ’H’  performed  quite  well  according  to  requirement  conformance  and  testing  results. 


while  team  T  did  poorly.  Also,  the  second  group  of  subjects  did  not  divide  evenly  Into 
three-person  teams.  Since  one  of  those  Individuals  had  extensive  professional  experience, 
non-Cleanroom  team  'e'  consisted  of  that  one  highly  experienced  person.  Thus  at  pro¬ 
ject  completion,  there  were  four  three-person  and  one  one-person  non-Cleanroom  teams. 
Although  team  ’e’  wrote  over  1300  source  lines,  this  highly  experienced  person  did  not 
do  as  well  as  the  other  teams  In  some  respects.  This  Is  consistent  with  another  study  In 
which  teams  applying  a  “disciplined  methodology"  In  development  outperformed  Indivi¬ 
duals  (Baslll  &  Reiter  81].  Appendix  A  contains  the  significance  levels  for  the  above 
results  when  team  *e,‘  when  teams  'H’  and  T,'  and  when  teams  ’e,’  'H,'  and  T  are 
removed  from  the  analysis.  Removing  teams  *H'  and  'I'  has  little  effect  on  the 
significance  levels,  while  the  removal  of  team  V  causes  a  decrease  In  all  of  the 
significance  levels  except  for  executable  statements,  software  science  effort,  cyclomatlc 
complexity,  syntactic  complexity,  connect-tlme,  and  cpu-tlme. 

5.  Conclusions 

This  paper  describes  “Cleanroom"  software  development  -  an  approach  Intended  to 
produce  highly  reliable  software  by  Integrating  formal  methods  for  specification  and 
design,  complete  off-line  development,  and  statistically  based  testing.  The  goal  struc¬ 
ture.  experimental  approach,  data  analysis,  and  conclusions  are  presented  for  a 
repllcated-project  study  examining  the  Cleanroom  approach.  This  Is  the  first  Investiga¬ 
tion  known  to  the  authors  that  applied  Cleanroom  and  characterized  Its  effect  relative 
to  a  more  traditional  development  approach. 
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The  data  analysis  presented  and  the  testimony  provided  by  the  developers  suggest 
that  the  major  results  of  this  study  are  1)  most  developers  were  able  to  apply  the  tech¬ 
niques  of  Cleanroom  effectively;  2)  the  Cleanroom  teams’  products  met  system  require¬ 
ments  more  completely  and  had  a  higher  percentage  of  successful  test  cases;  3)  the 
source  code  developed  using  Cleanroom  had  more  comments  and  less  dense  complexity; 
4)  the  use  of  Cleanroom  successfully  modified  aspects  of  development  style;  and  5)  most 
Cleanroom  developers  Indicated  they  would  use  the  approach  again. 

It  seems  that  the  Ideas  In  Cleanroom  help  attain  the  goals  of  producing  high  quality 
software  and  increasing  the  discipline  In  the  software  development  process.  The  com¬ 
plete  separation  of  development  from  testing  appears  to  cause  a  modification  In  the 
developers'  behavior,  resulting  In  Increased  process  control  and  In  more  effective  use  of 
formal  methods  for  software  specification,  design,  off-line  review,  and  verification.  It 
seems  that  system  modification  and  maintenance  would  be  more  easily  done  on  a  pro¬ 
duct  developed  In  the  Cleanroom  method,  because  of  the  product's  thoroughly  conceived 
design  and  higher  readability.  Thus,  achieving  high  requirement  conformance  and  high 
operational  reliability  coupled  with  low  maintenance  costs  would  help  reduce  overall 
costs,  satisfy  the  user  community,  and  support  a  long  product  lifetime. 

This  empirical  study  is  Intended  to  advance  the  understanding  of  the  relationship 
between  Introducing  discipline  Into  the  development  process  (as  In  Cleanroom)  and 
several  aspects  of  product  quality:  conformance  with  requirements,  high  operational  reli¬ 
ability,  and  easily  modifiable  source  code.  The  results  given  were  calculated  from  a  set 
of  teams  applying  Cleanroom  development  on  a  relatively  small  project  -  the  direct 
extrapolation  of  the  findings  to  other  projects  and  development  environments  Is  not 
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Implied.  Valuable  Insights,  however,  have  been  gained  from  the  analysis. 
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7.  Appendix  A. 

Figure  13  presents  the  measure  averages  and  the  significance  levels  for  the  above 
comparisons  when  team  'e,'  when  teams  'H'  and  T,'  and  when  teams  ’e,’  'H,*  and  T  are 
removed.  The  significance  levels  for  the  Mann- Whitney  statistics  reported  are  the  pro¬ 
bability  of  Type  I  error  In  an  one-talled  test. 


Figure  13.  Summary  of  measure  averages  and  significance 

levels. 

Measure 

Average 

Mann- Whitney  significance  levels 

Cleanroom 

Teams 

Non-Cleanroom 

Teams 

All 

Teams 

Without 

Team  e 

Without 

Teams  H.I 

Without 

Teams  e.H.I 

Source  lines 

1320.0 

1491.2 

.196 

.240 

.198 

Executable  stmts 
^Procedures  & 

604.1 

625.4 

.500 

.286 

.367 

functions 

^Implementation 

36.5 

40.0 

.357 

.500 

.330 

.500 

completeness 
^Successful  tests  (w/o 

82.5 

60.0 

.088 

.197 

.093 

.196 

duplicate  failures) 
9oSuccessful  tests  (w/ 

02.5 

80.8 

.055 

.128 

.053 

.116 

duplicate  failures) 

78.7 

59.2 

.134 

.285 

.151 

.304 

^Comments 

Syntactic  complexity/ 

194.9 

122.2 

.089 

102 

190 

.198 

executable  stmts 

1.5 

1.6 

.079 

.179 

.082 

■ 

Softwa^  Science  E 

6728  6e3 

7355. 4e3 

.451 

.240 

.442 

Cyclomatic  complexity 

196.8 

212.2 

.250 

.198 

.255 

Syntactic  complexity 

917.5 

1017  0 

500 

.286 

.500 

.305 

#Global  data  items 

37.8 

24.2 

.071 

.129 

.053 

.117 

^Assignment  stmts 

26.6 

129 

040 

.087 

Off-line  effectiveness 

2  5 

.065 

.098 

098 

Connect-time  (hr.) 

71.3 

.012 

.121 

021 

Cpu-time  (min.) 

71  7 

136.1 

.017 

.072 

.009 

^Deliveries 

4.1 

2.6 

.015 

.010 

.022 

8.  References 


[Baker  72] 

F.  T.  Baker,  Chief  Programmer  Team  Management  of  Production  Program¬ 
ming,  IBM  Systems  J.  11,  1,  pp.  131-149,  1972. 


[Baker  81] 

F.  T.  Baker,  Chief  Programmer  Teams,  pp.  249-254  In  Tutorial  on  Struc¬ 
tured  Programming:  Integrated  Practices,  ed.  V.  R.  Baslll  and  F.  T.  Baker, 
IEEE,  1981. 

[Baslll  8c  Turner  78] 

V.  R.  Baslll  and  A.  J.  Turner,  SIhfPL-T:  A  Structured  Programming 
Language,  Paladin  House  Publishers,  Geneva,  IL,  1978. 

[Baslll  8c  Reiter  81] 

V.  R.  Baslll  and  R.  W.  Reiter,  A  Controlled  Experiment  Quantitatively 
Comparing  Software  Development  Approaches,  IEEE  Trans.  Software  Engr. 
SE-7,  May  1981. 

[Baslll  8c  Hutchens  83] 

V.  R.  Baslll  and  D.  H.  Hutchens,  An  Empirical  Study  of  a  Syntactic  Metric 
Family,  Trans.  Software  Engr.  SE-9,  8,  pp.  684-872,  Nov.  1983. 

[Baslll  8c  Selby  84] 

V.  R.  Baslll  and  R.  W.  Selby,  Jr.,  Data  Collection  and  .Analysis  In  Software 
Research  and  Management,  Proceedings  of  the  American  Statistical  Associa¬ 
tion  and  Biometric  Society  Joint  Statistical  Meetings,  Philadelphia,  PA,  Au¬ 
gust  13-16,  1984. 

[Baslll  8c  Weiss  84] 

V.  R.  Baslll  and  D.  M.  Weiss,  A  Methodology  for  Collecting  Valid  Software 
Engineering  Data*,  Trans.  Software  Engr.  SE-10,  8,  pp.  728-738,  Nov.  1984. 


[Currlt  83] 

P.  A.  Currlt,  Cleanroom  Certification  Model,  Proc.  Eight  Ann.  Software 
Engr.  Workshop,  NASA/GSFC,  Greenbelt,  MD,  Nov.  1983. 


[Curtis  83] 

B.  Curtis,  Cognitive  Science  of  Programming,  Sixth  Minnowbrook  Workshop 
on  Software  Performance  Evaluation,  Blue  Mountain  Lake,  NY,  July  19-22, 
1983. 


27 


[Duran  Sc  Ntafos  81} 

J.  W.  Duran  and  S.  Ntafos,  A  Report  on  -Random  Testing*,  Proc.  Fifth  Int. 
Conf.  Software  Engr.,  San  Diego,  CA,  pp.  179-183,  March  9-12,  1981. 

[Dyer  Sc  Mills  82] 

M.  Dyer  and  H.  D.  Mills,  Developing  Electronic  Systems  with  Certifiable  Re¬ 
liability,  Proc.  NATO  Conf. ,  Summer,  1982. 


[Dyer  82] 

M.  Dyer,  Cleanroom  Software  Development  Method,  IBM  Federal  Systems 
Division,  Bethesda,  MD,  October  14,  1982. 

[Dyer  83] 

M.  Dyer,  Software  Validation  In  the  Cleanroom  Development  Method,  IBM- 
FSD  Tech.  Rep.  86.0003,  August  19,  1983. 


[Fagan  76] 

M.  E.  Fagan,  Design  and  Code  Inspections  to  Reduce  Errors  In  Program  De¬ 
velopment,  IBM  Sys.  J.  15,  3,  pp.  182-211,  1976. 

[Ferrentlno  Sc  Mills  77] 

A.  B.  Ferrentlno  and  H.  D.  Mills,  State  Machines  and  Their  Semantics  In 
Software  Engineering,  Proc.  IEEE  COMPSAC,  1977. 


[Goel  S3] 

A.  L.  Goel,  A  Guidebook  for  Software  Reliability  Assessment,  Dept.  Industri¬ 
al  Engr.  and  Operations  Research,  Syracuse  Unlv.,  New  York,  Tech.  Rep. 
83-11,  April  1983. 


[Halstead  77] 

M.  H.  Halstead,  Elements  of  Software  Science,  North  Holland,  New  York, 
1977. 


[Hoare  09] 

C.  A.  R.  Hoare,  An  Axiomatic  Basis  for  Computer  Programming,  Communi¬ 
cations  of  the  ACM  12,  10,  pp.  576-583,  Oct.  1989. 

[Howden  78] 

W.  E.  Howden,  Reliability  of  the  Path  Analysis  Testing  Strategy,  IEEE 
Trans.  Software  Engr.  SE-2,  3,  Sept.  1978. 

[Linger,  Mills  Sc  Witt  79] 

R.  C.  Linger,  H.  D.  Mills,  and  B.  I.  Witt,  Structured  Programming:  Theory 
and  Practice.  Addison- Wesley,  Reading,  MA.  1979. 


28 


(McCabe  70] 

T.  J.  McCabe,  A  Complexity  Measure,  IEEE  Trans.  Software  Engr.  SE-2,  4, 
pp.  308-320,  Dec.  1970. 


(Mills  72a] 

H.  D.  Mills,  Chief  Programmer  Teams:  Principles  and  Procedures,  IBM 
Corp.,  Gaithersburg,  MD,  Rep.  FSC  71-0012,  1972. 

(Mills  72b] 

H.  D.  Mills,  Mathematical  Foundations  for  Structural  Programming,  IBM 
Report  FSL  72-8021,  1972. 

(Musa  75] 

J.  D.  Musa,  A  Theory  of  Software  Reliability  and  Its  Application,  IEEE 
Trans.  Software  Engr.  SE-1,  3,  pp.  312-327,  1975. 

(Myers  70] 

G.  J.  Myers,  Software  Reliability:  Principles  &  Practices,  John  WUey  8c  Sons, 
New  York,  1970. 


(Parnas  72] 

D.  L.  Pamas,  On  the  Criteria  to  be  Used  In  Decomposing  Systems  Into 
Modules,  Communications  of  the  ACM  15,  12,  pp.  1053-1058,  1972. 


(Selby  84] 

R.  W.  Selby,  Jr.,  A  Quantitative  Approach  for  Evaluating  Software  Techno¬ 
logies,  Dept.  .Com.  Scl.,  Unlv.  Maryland,  College  Park,  Ph.  D.  Dissertation, 
1984. 

(Shankar  82] 

K.  S.  Shankar,  A  Functional  Approach  to  Module  Verification.  IEEE  Trans. 
Software  Engr.  SE-8,  2,  March  1982. 

(Thayer,  Llpow  8c  Nelson  78] 

R.  A.  Thayer,  M.  Llpow,  and  E.  C.  Nelson,  Software  Reliability,  North- 
Holland,  Amsterdam,  1978. 


29 


V  V 


END 

FILMED 

5-85 


DTIC 


